The goal of the exploratory analysis is to identify the key predictors, generate insights and dicover trends and patters. Throughout this process we are mainly interested in detecting:
| ptidno | visit_num | apptdate | VLundetectable | viralload | viralloadc | Height |
|---|---|---|---|---|---|---|
| 52764 | 1 | 2017-01-01 | NA | NA | NA | |
| 52764 | 2 | 2017-04-25 | NA | NA | 162 | |
| 52764 | 3 | 2017-05-24 | NA | NA | 161 | |
| 52764 | 4 | 2017-06-28 | NA | NA | 162 | |
| 52764 | 5 | 2017-07-21 | NA | 40 | < | 162 |
| 52764 | 6 | 2017-08-22 | NA | NA | 162 | |
| 52764 | 7 | 2017-09-12 | NA | NA | 162 | |
| 52764 | 8 | 2017-10-16 | NA | NA | 162 | |
| 52764 | 9 | 2017-11-20 | NA | NA | 162 | |
| 52764 | 10 | 2017-12-18 | NA | NA | 162 | |
| 52764 | 11 | 2018-01-01 | NA | 40 | < | 162 |
| 52764 | 12 | 2018-02-19 | NA | NA | 162 | |
| 52764 | 13 | 2018-03-06 | NA | NA | 162 | |
| 52764 | 14 | 2018-05-07 | NA | 40 | < | 162 |
| 52764 | 15 | 2018-06-05 | NA | NA | 162 | |
| 52764 | 16 | 2018-09-10 | NA | NA | 162 | |
| 52764 | 17 | 2018-12-04 | NA | 40 | < | 162 |
| 52764 | 18 | 2019-01-01 | NA | NA | 162 | |
| 52764 | 19 | 2019-04-26 | NA | NA | 173 | |
| 52764 | 20 | 2019-07-17 | NA | NA | 173 | |
| 52764 | 21 | 2019-10-11 | NA | 40 | < | NA |
| 52764 | 22 | 2019-10-14 | NA | NA | NA | |
| 52764 | 23 | 2019-11-13 | NA | NA | NA |
| variable | n_miss | pct_miss |
|---|---|---|
| nhif | 83396 | 96.147018 |
| vl_count3 | 68926 | 79.464595 |
| vl_count2 | 60657 | 69.931287 |
| vl_count1 | 48470 | 55.880929 |
| entrypoint | 46852 | 54.015541 |
| civilstatusENROL | 22158 | 25.545897 |
| bmi | 19876 | 22.914985 |
| Height | 19581 | 22.574881 |
| ageatarvstart | 11592 | 13.364385 |
| Weight | 6309 | 7.273629 |
| male | 0 | 0.000000 |
| Death | 0 | 0.000000 |
| transfer | 0 | 0.000000 |
| program | 0 | 0.000000 |
To understand the relationship of each variable with the outcome variable:
| Table 4: Socio-demographics | |||||||||||
| Programs | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
Total No. 86,738 |
 |
AMPATH No. 42,756 |
FACES No. 10,329 |
IDI No. 2,244 |
KISESA No. 1,306 |
MASAKA No. 4,267 |
MBARARA No. 2,963 |
MOROGORO No. 1,785 |
RAKAI No. 19,688 |
TUMBI No. 1,400 |
|
| age at first exposure to ARV | |||||||||||
|   Mean (SD) | 35.1 (±10.9) |  | 37.0 (±11.2) | 32.4 (±9.8) | 35.6 (±10.8) | 35.2 (±11.0) | 31.7 (±9.3) | 31.7 (±9.7) | 40.0 (±11.4) | 32.5 (±9.9) | 39.6 (±11.7) |
| Â Â Median (IQR) | 33.0 (26.9 - 41.3) | Â | 35.4 (28.6 - 43.8) | 30.4 (25.3 - 37.2) | 33.6 (27.7 - 42.0) | 33.1 (27.4 - 40.6) | 30.0 (24.7 - 37.1) | 29.4 (24.5 - 36.2) | 39.3 (31.6 - 47.1) | 30.4 (25.0 - 37.9) | 38.7 (30.7 - 46.0) |
| Â Â Missing | 11,592 (13.4%) | Â | 5,479 (12.8%) | 1,666 (16.1%) | 312 (13.9%) | 78 (6.0%) | 147 (3.4%) | 123 (4.2%) | 43 (2.4%) | 3,679 (18.7%) | 65 (4.6%) |
| male gender | |||||||||||
| Â Â No | 54,332 (62.6%) | Â | 27,028 (49.7%) | 6,496 (12.0%) | 1,316 (2.4%) | 780 (1.4%) | 2,673 (4.9%) | 1,836 (3.4%) | 1,097 (2.0%) | 12,224 (22.5%) | 882 (1.6%) |
| Â Â Yes | 32,406 (37.4%) | Â | 15,728 (48.5%) | 3,833 (11.8%) | 928 (2.9%) | 526 (1.6%) | 1,594 (4.9%) | 1,127 (3.5%) | 688 (2.1%) | 7,464 (23.0%) | 518 (1.6%) |
| death | |||||||||||
| Â Â No | 82,158 (94.7%) | Â | 39,877 (48.5%) | 9,883 (12.0%) | 2,012 (2.4%) | 1,254 (1.5%) | 4,140 (5.0%) | 2,914 (3.5%) | 1,662 (2.0%) | 19,120 (23.3%) | 1,296 (1.6%) |
| Â Â Yes | 4,580 (5.3%) | Â | 2,879 (62.9%) | 446 (9.7%) | 232 (5.1%) | 52 (1.1%) | 127 (2.8%) | 49 (1.1%) | 123 (2.7%) | 568 (12.4%) | 104 (2.3%) |
| civil status at enrollment | |||||||||||
| Â Â Divorced | 1,877 (2.2%) | Â | 422 (22.5%) | 0 (0.0%) | 8 (0.4%) | 28 (1.5%) | 837 (44.6%) | 98 (5.2%) | 141 (7.5%) | 297 (15.8%) | 46 (2.5%) |
| Â Â Legally Married | 37,068 (42.7%) | Â | 20,314 (54.8%) | 6,091 (16.4%) | 1,078 (2.9%) | 808 (2.2%) | 1,377 (3.7%) | 1,650 (4.5%) | 280 (0.8%) | 5,242 (14.1%) | 228 (0.6%) |
| Â Â Living w/Partner | 1,017 (1.2%) | Â | 591 (58.1%) | 91 (8.9%) | 173 (17.0%) | 1 (0.1%) | 0 (0.0%) | 0 (0.0%) | 2 (0.2%) | 144 (14.2%) | 15 (1.5%) |
| Â Â Never Married and Not Living w/Partner | 8,941 (10.3%) | Â | 5,596 (62.6%) | 0 (0.0%) | 405 (4.5%) | 435 (4.9%) | 348 (3.9%) | 629 (7.0%) | 69 (0.8%) | 1,327 (14.8%) | 132 (1.5%) |
| Â Â Separated | 10,224 (11.8%) | Â | 5,224 (51.1%) | 2,414 (23.6%) | 430 (4.2%) | 0 (0.0%) | 0 (0.0%) | 466 (4.6%) | 0 (0.0%) | 1,690 (16.5%) | 0 (0.0%) |
| Â Â Widowed | 5,453 (6.3%) | Â | 3,777 (69.3%) | 803 (14.7%) | 140 (2.6%) | 32 (0.6%) | 45 (0.8%) | 118 (2.2%) | 38 (0.7%) | 463 (8.5%) | 37 (0.7%) |
| Â Â Missing | 22,158 (25.5%) | Â | 6,832 (30.8%) | 930 (4.2%) | 10 (0.0%) | 2 (0.0%) | 1,660 (7.5%) | 2 (0.0%) | 1,255 (5.7%) | 10,525 (47.5%) | 942 (4.3%) |
| patient transferred out | |||||||||||
| Â Â No | 80,287 (92.6%) | Â | 39,477 (49.2%) | 9,109 (11.3%) | 2,055 (2.6%) | 964 (1.2%) | 3,681 (4.6%) | 2,675 (3.3%) | 1,661 (2.1%) | 19,580 (24.4%) | 1,085 (1.4%) |
| Â Â Yes | 6,451 (7.4%) | Â | 3,279 (50.8%) | 1,220 (18.9%) | 189 (2.9%) | 342 (5.3%) | 586 (9.1%) | 288 (4.5%) | 124 (1.9%) | 108 (1.7%) | 315 (4.9%) |
| Weight (kg) at enrollment | |||||||||||
|   Mean (SD) | 57.8 (±11.3) |  | 58.4 (±11.5) | 59.7 (±10.9) | 59.9 (±13.7) | 56.0 (±10.2) | 57.1 (±10.7) | 60.0 (±12.0) | 58.3 (±13.7) | 54.9 (±9.4) | 58.7 (±13.6) |
| Â Â Median (IQR) | 56.5 (50.0 - 64.0) | Â | 57.0 (50.0 - 65.0) | 59.0 (52.0 - 65.0) | 58.0 (50.1 - 67.0) | 55.0 (50.0 - 61.0) | 55.8 (50.0 - 63.0) | 58.0 (51.0 - 66.0) | 56.0 (49.0 - 65.0) | 54.0 (49.0 - 60.0) | 57.0 (50.0 - 66.0) |
| Â Â Missing | 6,309 (7.3%) | Â | 3,426 (8.0%) | 131 (1.3%) | 90 (4.0%) | 2 (0.2%) | 91 (2.1%) | 89 (3.0%) | 29 (1.6%) | 2,388 (12.1%) | 63 (4.5%) |
| BMI at enrollment | |||||||||||
|   Mean (SD) | 21.5 (±4.1) |  | 21.2 (±4.1) | 21.4 (±3.8) | 22.8 (±5.2) | 20.8 (±3.7) | 21.9 (±4.0) | 23.1 (±4.6) | 23.1 (±5.3) | 21.4 (±3.6) | 22.1 (±5.3) |
| Â Â Median (IQR) | 20.9 (18.7 - 23.5) | Â | 20.6 (18.5 - 23.2) | 20.9 (18.9 - 23.3) | 21.8 (19.3 - 25.3) | 20.3 (18.4 - 22.7) | 21.3 (19.2 - 23.8) | 22.4 (20.0 - 25.5) | 22.2 (19.6 - 25.6) | 21.0 (19.1 - 23.2) | 21.3 (18.5 - 24.7) |
| Â Â Missing | 19,876 (22.9%) | Â | 9,227 (21.6%) | 158 (1.5%) | 163 (7.3%) | 2 (0.2%) | 1,284 (30.1%) | 438 (14.8%) | 33 (1.8%) | 8,506 (43.2%) | 65 (4.6%) |
| ‡ | |||||||||||
Patient timeline
To understand the relationship of each continuous variable with the y variable:
## [1] "Height"
## [1] "Weight"
## [1] "bmi"
## [1] "ageatarvstart"
## [1] "ageatarvstartTHERAP"
## [1] "num_days_to_vl1"
## [1] "num_days_to_vl2"
## [1] "num_days_to_vl3"
## [1] "vl_count1"
## [1] "vl_count2"
## [1] "vl_count3"
## [1] "num_encs_to_vl1"
## [1] "num_encs_to_vl2"
## [1] "num_encs_to_vl3"
For each continuous column, visually check the following:
## [1] "Height"
## [1] "Weight"
## [1] "bmi"
## [1] "ageatarvstart"
## [1] "ageatarvstartTHERAP"
## [1] "num_days_to_vl1"
## [1] "num_days_to_vl2"
## [1] "num_days_to_vl3"
## [1] "vl_count1"
## [1] "vl_count2"
## [1] "vl_count3"
## [1] "num_encs_to_vl1"
## [1] "num_encs_to_vl2"
## [1] "num_encs_to_vl3"
For categorical variables,visually check the frequencies and percentages.
## [1] "vl_failure2" "vl_failure1" "male" "Death"
## [1] "civilstatusENROL" "transfer" "entrypoint" "nhif"